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Abstract 

| This work is in the line of designing efficient checkers for testing the reliability of some 

massive data structures. Given a sequential access to the insert/extract operations on such a 
structure, one would like to decide, a posteriori only, if it corresponds to the evolution of a 
reliable structure. In a context of massive data, one would like to minimize both the amount of 
reliable memory of the checker and the number of passes on the sequence of operations. 
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t/3 . Chu, Kannan and McGregor [9] initiated the study of checking priority queues in this setting. 

They showed that the use of timestamps allows to check a priority queue with a single pass and 
memory space 0(V~N). Later, Chakrabarti, Cormode, Kondapally and McGregor [7] removed 
the use of timestamps, and proved that more passes do not help. 

We show that, even in the presence of timestamps, more passes do not help, solving an open 
problem of [9l [7]. On the other hand, we show that a second pass, but in reverse direction, 
shrinks the memory space to 0((logiV) 2 ), extending a phenomenon the first time observed by 
Magniez, Mathieu and Nayak [15 for checking well-parenthesized expressions. 

OV 

^ ; 1 Introduction 



The reliability of memory is central and becomes challenging when it is massive. In the context 
of program checking [4] this problem has been addressed by Blum, Evans, Gemmell, Kannan and 
Naor [3]. They designed on-line checkers that use a small amount of reliable memory to test the 
behavior of some data structures. Checkers are allowed to be randomized and to err with small 
error probability. In that case the error probability is not over the inputs but over the random 
coins of the algorithm. 

Chu, Kannan and McGregor [9] revisited this problem for priority queue data structures, where 
the checker only has to detect an error after processing an entire sequence of data accesses. This can 
be rephrased as a one-pass streaming recognition problem. Streaming algorithms sequentially scan 
the whole input piece by piece in one sequential pass, or in a small number of passes, while using 
sublinear memory space. In our context, the stream is defined by the sequence of insertions and 
extractions on the priority queue. Using a streaming algorithm, the objective is then to decide if 
the stream corresponds to a correct implementation of a priority queue. We also consider collection 
data structures that implement multisets. 



* Supported by the French ANR Defis program under contract ANR-08-EMER-012 (QRAC project) 
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Definition 1 (Collection, PQ). Let So be some alphabet. Let S = {ins(a), ext(a) : a G So}. 
For iu G S^ ; define inductively multisets Mi by Mq = %, Mi = Mj_i \ {a} if w[i] = ext(a), and 
Mi = Mj_i U {a} ifw[i] = ins(a). 

Then w G Collection(So) if and only if M n = and a G Mj_i w/ien = ecct(a), for 
i = 1,... ,N. Moreover, w G PQ(U), for U G N, if and only if w G Collection({0, 1, . . . , U}) 
and a = max(Mj_i) when w[i] = ext(a), for i = 1, . . . , N . 

Streaming algorithms were initially designed with a single pass: when a piece of the stream has 
been read, it is gone for ever. This makes those algorithms of practical interest for online context, 
such as network monitoring, for which first streaming algorithms were developed pp. Motivated 
by the explosion in the size of the data that algorithms are called upon to process in everyday 
real-time applications, the area of streaming algorithms has experienced tremendous growth over 
the last decade in many applications. In particular, a streaming algorithm can model an external 
read-only memory. Examples of such applications occur in bioinformatics for genome decoding, or 
in Web databases for the search of documents. In that context, considering multi-pass streaming 
algorithm is relevant. 

Using standard arguments one can establish that every p-pass randomized streaming algorithm 
needs memory space £l(N/p) for recognizing Collection. Nonetheless, Chakrabarti, Cormode, 
Kondapally and McGregor [7j gave a one-pass randomized for PQ using memory space 0(\^N). 
They also showed that several passes do not help, since any p-pass randomized algorithm would 
require memory space f2(-v//V /p). A similar lower bound was showed independently, but using 
different tools, by Jain and Nayak |10| . The case of a single pass was established previously by 
Magniez, Mathieu and Nayak [15] for checking the well-formedness of parenthesis expressions, or 
equivalently the behavior of a stack. 

A simpler variant of PQ with timestamps was in fact first studied by Chu, Kannan and Mc- 
Gregor [9j, where now each item is inserted to the queue with its index. 

Definition 2 (PQ-TS). Let S = {ins(a), ext(a) : a G {0, 1, ... , U}} x N. Let w G E N . Then 
w G PQ-TS([/) if and only if w G Collection(S), w[1, ... ,N][1] G PQ(U), and w[i][2] = i when 
w[i][l] = ins(a). 

Nonetheless the two works [91 [7] let open two problems. The lower bound of [7] was only proved 
for PQ, and no significant lower bounds for PQ-TS was established. Moreover, the streaming 
complexity of PQ for algorithms that can process the stream in any direction has not been studied. 

Even though recognizing PQ-TS is obviously easier than recognizing PQ, our first contribution 
(Section [3|) consists in showing that they both obey the same limitation, even with multiple passes 
in the same direction. 

Theorem 3. Every p-pass randomized streaming algorithm recognizing PQ-TS(3iV/2) with bounded 
error 1/3 requires memory space VL{\/N /p) for inputs of length N . 

As a consequence, since this lower bound uses very restricted hard instances, it models most of 
possible variations. For instance, assuming that the input is in Collection and has no duplicates, 
is not sufficient to guarantee a faster algorithm. The proof of Theorem [3] consists in introducing 
a related communication problem with @(y/N) players. Then we reduce the number of players 
to 3, and prove a lower bound on the information carried by players, leading to the desired lower 
bound. We are following the information cost approach taken in [HJ \T7\ [21 \12\ lllj . among other 
works. Recently, the information cost appeared as one of the most central notion in communication 
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complexity [61 [5j [13] . The information cost of a protocol is the amount of information that messages 
carry about players' inputs. We adapt this notion to suit both the nature of streaming algorithms 
and of our problem. 

Even if our result suggests that allowing multiple passes does not help, one could also consider 
the case of bidirectional passes. We believe that it is a natural relaxation of multi-pass streaming 
algorithms where the stream models some external read-only memory. In that case, we show 
that a second pass, but in reverse order, makes the problem of checking PQ easy, even with no 
timestamps (Section H]). A similar phenomenon has been established previously in |15] for checking 
the well-formedness of parenthesis expressions. Their problem is simpler than ours, and therefore 
our algorithm is more general. 

Theorem 4. There is a bidirectional 2-pass randomized streaming algorithm recognizing PQ(U) 
with memory space 0((log iV)(log U + logiV)), time per processing item polylog(./V, U), and one- 
sided bounded error N~ c , for inputs of length N and any constant c > 0. 

Our algorithm uses a hierarchical data structure similar to the one introduced in [15] for checking 
well-parenthesized expressions. At high level, it also behaves similarly. It performs one pass in each 
direction and makes an on-line compression of past information in at most log N hashcodes. While 
this compression can loose information, the compression technique ensures that a mistake is always 
detected in one of the two directions. Nonetheless our algorithm differs on two main points. First, 
unlike parenthesized expressions, PQ is not symmetric. Therefore one has to design an algorithm 
for each pass. Second, the one-pass algorithm for PQ [7] is technically more advanced than the one 
of [H]. Thus designing a bidirectional 2-pass algorithm for PQ is more challenging. 

Theorems [3] and H] point out a strange situation but not isolated at all. Languages studied 
in [9l [HI El [H] and in this paper have space complexity 0(v / Apolylog(./V)) for a single pass, 
f2(\/iV jp) for p passes in the same direction, and polylog(iV) for 2 passes but one in each direction. 
We hope this paper makes progress in the study that phenomenon. 

2 Preliminaries 

In streaming algorithms (see |16j for an introduction) , a pass on an input w 6 T, N , for some alphabet 
S, means that w is given as an input stream w[l], w[2], . . . ,w[N], which arrives sequentially, i.e., 
letter by letter in this order. For simplicity, we assume throughout this article that the input length 
N is always given to the algorithm in advance. Nonetheless, all our algorithms can be adapted to 
the case in which N is unknown until the end of a pass. 

Definition 5 (Streaming algorithm). A p-pass randomized streaming algorithm with space s(N) 
and time t(N) is a randomized algorithm that, given w G Y, N as an input stream, 

• performs k sequential passes on w; 

• maintains a memory space of size at most s(N) bits while reading w; 

• has running time at most t(N) per processed letter w[i]; 

• has preprocessing and postprocessing time at most t(N). 

The algorithm is bidirectional if it is allowed to access to the input in the reverse order, after 
reaching the end of the input. Then p is the total number of passes in either direction. 
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The proof of our lower bound uses the language of communication complexity with multi-players, 
and is based on information theory arguments. We consider number-in-hand and message-passing 
communication protocols. Each player is given some input, and can communicate with another 
player according to the rules of the protocol. Our players are embedded into a directed circle, 
so that each player can receive (resp. transmit) a message from its unique predecessor (resp. 
successor). Each player send a message after receiving one, until the end of the protocol is reached. 
Players have no space and time restriction. Only the number of rounds and the size of messages 
are constrained. 

Consider a randomized multi-player communication protocol P. We consider only two types of 
random source, that we call coins. Each player has access to its own independent source of private 
coins. In addition, all players share another common source of public coins. The output of P is 
announced by the last player. This is therefore the last message of the last player. We say that P 
is with bounded error e when P errs with probability at most e over the private and public coins. 
The transcript II of P is the concatenation of all messages sent by all players, including all public 
coins. In particular, it contains the output of P, since it is given by the last player. Given a subset 
S of players, we let Us be the concatenation of all messages sent by players in S, including again 
all public coins. 

We now remind the usual notions of entropy H and mutual information I. Let X, Y, Z be random 
variables. Then R(X) = - E x ^ x logPrpf = x), R(X\Y = y) = - E^y logPrpT = x\Y = y), 
H(X|F) = E y <_ y H(X|y = y), and l(X : Y\Z) = H(X|Z) - R(X\Y,Z). The entropy and the 
mutual information are non negative and satisfy 1(X :Y\Z) = 1{Y : X\Z). 

The mutual information between two random variables is connected to the Hellinger distance h 
between their respective distribution probabilities. Given a random variable X we also denote by 
X its underlying distribution. 

Proposition 6 (Average encoding). Let X,Y be random variables. Then Ej^y h 2 (X|y =J/ , X) < 
kI{X : Y), where K=^. 

The Hellinger distance also generalizes the cut-and-paste property of deterministic protocols to 
randomized ones. 

Proposition 7 (Cut and paste). Let P be a 2-player randomized protocol. Let Tl(x,y) denote the 
random variable representing the transcript in P when Players A,B have resp. inputs x,y. Then 
h(II(x, y), n(u, v )) = h(II(x, v), n(u, y)), for all pairs (x,y) and (u,v). 

Last we use that the square of the Hellinger distance is convex, and the following connexion to 
the more convention ^-distance: h(X,Y) 2 < ^\\X — Y\\\ < \^2h(X, Y). For a reference on these 
results, see [TO] . 

3 Lower bound for PQ-TS 

The proof of our lower bound consists in first translating it into a 3m-player communication prob- 
lem, for some large m; then reducing the number of players to 3 using the information cost approach; 
and last studying the base case of 3 players using information theory arguments. 
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Figure 1: Left: Instance of RAiNDROPS(m, 4) with one error: 17 is extracted after 16. Insertions a% 
are circled. Right: Cutting RAiNDROPS(m, 4) into 3m pieces to make it a communication problem. 
Players' input are within each corresponding region. 



3.1 Prom streaming algorithms to communication protocols 

In this section, we write a instead of ins (a) and a instead of ext(a). Consider the following set of 
hard instances of size N = (2n + 2)m: 

RAiNDROPS(m, n) (see LHS of Figured]) 

• For i = 1, 2, . . . , m, repeat the following motif: 

— For j = 1, 2, . . . , n, insert either Vij = 3(m — j) or Vij = 3(ni — j) + 2 

— Insert either cij = 3(m — (fej — 1)) + 1 or = 3(ni — fcj) + 1, for some ki £ 
{2,...,n} 

— Extract v^i, v^2, ■ ■ ■ , ^,^-1, Oj in decreasing order 

• Extract everything left in decreasing order 

Observe that such an instance is in Collection. One can compute the timestamps for each 
value by maintaining only 0(log N) additionnal bits. Last, there is only one potential error in each 
motif that can make it outside of PQ-TS. Indeed, ^,1,1^2, • • • , w^-i, Oj are in decreasing order 
up to a switch between o, and Vi^-i- 

Given such an instance as a stream, an algorithm for PQ-TS must decide if an error occurs 
between aj and Uf~&7, for some i. Intuitively, if the memory space is less than en, for a small enough 
constant e > 0, then the algorithm cannot remember all the values (vij)j when ai is extracted, and 
therefore cannot check a potential error with a%. The next opportunity is during the last sequence 
of extractions. But then, the algorithm has to remember all values (aj)j, which is again impossible 
if the memory space is less than em. 

In order to formalize this intuition, Lemma [8] (proof in Appendix[A|) first translates our problem 
into a communication one between 3m players as shown on the RHS of Figure [TJ Then we analyze 
its complexity using information theory arguments in Section 13.21 
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Any insertion and extraction of an instance in Raindrops(to, n) can be described by its index 
and a single bit. Let Xi[j] G {0, 1} such that Vij = 3{ni — j) + 2xi\j\. Similarly, let dj G {0, 1} such 
that a, = 3{ni — k{) + 1 + 3di. For simplicity, we write x instead of (xj)i<j< m . Similarly, we use 
the notations k and d. Then our related communication problem is: 

WEAKlNDEX(TO, n) 

• Input for players (A i: Bi, Cj)i<K m : 

— Player Ai has a sequence Xi G {0, l} n 

— Player Bi has Xi[l, ki — 1], with fe, G {2, . . . , n} and di G {0, 1} 

— Player Cj has Xj [fej , n] 

• Output: / m (x,k,d) = V™i f(xi,ki,di), where f(x,k,d) = [(d = 0) A (a?[fc] = 1)] 

• Communication settings: 

— One round: each player sends a message to the next player according to the 
diagram A\ — > B\ ->■ A 2 — > ■ ■ ■ — > B m — > C rn — > C m _i -)■•••-)■ C\. 

— Multiple rounds: If there is at least one round left, C\ sends a message to A\, 
and then players continue with the next round. 

Lemma 8. Assume there is a p-pass randomized streaming algorithm for deciding if an instance 
of Raindrops (n, m) is in PQ-TS(3mn) with memory space s(m,n) and bounded error e. Then 
there is a p-round randomized protocol for WeakIndex(?i-, m) with bounded error e such that each 
message has size at most s(m,n). 

We are now ready to give the structure of the proof of Theorem El which has techniques based on 
information theory. Define the following collapsing distribution fj,Q of hard inputs (x,k,d), encoding 
instances of Raindrops(1, n), where / always takes value 0. Distribution fiQ is such that (x,k) is 
uniform on {0, l} n x {2, . . . , n} and, given x, k, the bit d G {0, 1} is uniform if x[k] = 0, and d = 1 
if x[k] = 1. From now on, (X, K, D) are random variables distributed according to no, and (x, k, d) 
denote any of their values. 

Then the proof of Theorem [3] consists in studying the information cost of any communica- 
tion protocol for WEAKlNDEx(n, m), which is a lower bound on its communication complexity. 
Using that (io is collapsing for /, Lemma [9] establishes a direct sum on the information cost of 
WeakIndex(ti, to). Then, even if / is constant on fiQ, Lemma fT2l lower bounds the information 
cost of a single instance of WEAKlNDEx(n, 1). 

Proof of Theorem^ Let n, N be positive integers such that N = (2n + 2)n. Assume that there 
exists a p-pass randomized algorithm that recognizes PQ-TS(3A r /2), with memory space an and 
bounded error e, for inputs of size N. Then, by Lemma[8l there a p-round randomized protocol P for 
WeakIndex(ti, n) such that each message has size at most an. By Lemma [H one can derive from 
P another (p+ l)-round randomized protocol P' for WEAKlNDEx(n, 1) with bounded error e, and 
transcript IT satisfying |IT| < 3(t + l)an and max{I(D : U' B \X, K),I(K, D : U' C \X)} < (p + l)a. 
Then by Lemma [T2| 3(p + l)a > (1 — 2e)/10, that is a = 0(l/p), concluding the proof. □ 
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3.2 Communication complexity lower bound 

We first reduce the general problem WeakIndex(7i, m) with 3m players to a single instance of 
WeakIndex(?i, 1) with 3 players. In order to do so we exploit the direct sum property of the 
information cost. The use of a collapsing distribution where / is always is crucial. 

Lemma 9. If there is a p-round randomized protocol P for WEAKlNDEx(n, m) with bounded error 
e and messages of size at most s(m,n), then there is a (p + l)-round randomized protocol P' for 
WeakIndex(?i, 1) with bounded error e, and transcript P' satisfying |IT| < 3(p + l)s(m,n) and 
max{I(L> : U' B \X, K),l(K, D : IL' C \X)} < ^s(m,n). 

Sketch of proof. Given a protocol P, we show how to construct another protocol P' for any instance 
(x, k, d) of WeakIndex(?i, 1). In order to avoid any confusion, we denote by A, B and C the three 
players of P', and by (A{, Bi,Ci)i the ones of P. 

Protocol P' 

• Using public coins, all players generate uniformly at random j E {1, . . . ,m}, and 
Xi E {0, l} n for i / j 

• Players A, B and C set respectively their inputs to the ones of Aj,Bj, Cj 

• For all i > j, Player B generates, using its private coins, uniformly at random k\ E 
{2, . . . , n}, and then it generates uniformly at random d{ such that f(xi,ki,di) = 

• For all i < j, Player C generates, using its private coins, uniformly at random ki E 
{2, . . . , n}, and then it generates uniformly at random d{ such that f(xi,ki,d{) = 

• Players A, B and C run P as follows. A simulates Aj only, B simulates Bj and 
(A4, Bi, Cj)j>j, and C simulates Cj and (Ai, Bi,Ci)i < j. 

Observe that A starts the protocol if j = 1, and C starts otherwise. Moreover C stops the simulation 
after p rounds if j = 1, and after p+1 rounds otherwise. For all i ^ j, entries are generated such that 
f(xi,ki,a{) = 0, therefore / m (X,k, d) = f(xj,kj,aj) = f(x,k,a), and P' has the same bounded 
error than P. 

Then we show in Appendix |A] that P' satisfies the required conditions of the lemma. □ 

We now prove a trade-off between the bounded error of a protocol for a single instance of 
WeakIndex(ti, 1) and its information cost. The proof involves some of the tools of [10] but with 
some additional obstacles to apply them. The inherent difficulty is due to that we have 3 players 
whereas the cute-and-paste property applies to 2-player protocols. Therefore we have to group 2 
players together. 

Given some parameters (x, k, a) for an input of WEAKlNDEx(n, 1), we denote by II(x, k, a) the 
random variable describing the transcript II of our protocol. We start by two lemmas exploiting 
the average encoding theorem (proofs in Appendix |A|) . 

Lemma 10. Let P be a randomized protocol for WeakIndex(?t., 1) with transcript II satisfying 
|II| < an and 1{K,D : Uc\X) < a. Then 

E h 2 (n(x[l,/ - 1}0X[1 + l,n},l,l),U(x[l,l - 1}1X[1 + l,n],l,l)) < 28a, 
x[l,l—l],l 

where I € + l,n] and x[l,l — 1] are uniformly distributed. 
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Lemma 11. Let P be a randomized protocol for WEAKlNDEx(n, 1) with transcript II satisfying 
I(D : U B \X,K) < a. Then 

E h 2 (n(x[l,/ - 1}0X[1 + l,ra],Z,0),II(x[l,Z - l]QX[l + l,n],/,l)) < 12a, 

x[l,l— l],l 

where I € + l,n] and x[l,l — 1] are uniformly distributed. 

We now end with the main lemma which combines both previous ones and applies the cut-and- 
paste property, where Players A, C are grouped. 

Lemma 12. Let P be a randomized protocol for WeakIndex(ti, 1) with bounded error e, and 
transcript II satisfying \U\ < an and max{I(D : Hb\X, K), L(K, D : Hc\X)} < a. Then a > 
(1 - 2e)/10. 

Proof. Let L be a uniform integer random variable in + 1, n]. Remind that we enforce the output 
of P to be part of II. Therefore, any player, and in particular B, can compute / with bounded 
error e given II. Since f(x[l,l- 1}0X[1 + l,n],/,0) = and f(x[l,l- 1}1X[1 + l,n],/,l) = 1, the 
error parameter e must satisfies 

E \\U(x[l,l- l]QX[l + l,n],/,0) - U(x[l,l- 1]1X[1 + 1, n], 1, 0)||i > 2(1 - 2e). 

x[l,l— l],l 

The rest of the proof consists in upper bounding the LHS by 19a. 

Applying the triangle inequality and that (u + v) 2 < 2(u 2 + v 2 ) on the inequalities of Lemmas [101 
and QT] gives 

E h 2 (n(x[l,/ - l}QX[l + l,n],/,0),n(x[l,/ - 1]1X[1 + l,n],/,l)) < 30a. 

.7'1.' I ./ 

We then apply the cut-and-paste property by considering (^4, C) as a single player with transcript 
Ha,c- Therefore 

E h 2 (n(x[l, I - 1}0X[1 + 1, n],l, 1), II(x[l, I - 1]1X[1 + 1, n], 1, 0)) < 30a. 

.7'1.' i ./ 

Combining again with the inequality from Lemma QT] gives 

E h 2 (Ll(x[l,Z - l]0X[l + l,n],l,0),IL(x[l,l - 1]1X[1 + l,n],/,0)) < 42a. 

x[l,J— 1],J 

Last, we get the requested upper bound by using the connexion between the Hellinger distance and 
the ^i-distance, and the convexity of the square function. □ 

4 Bidirectional streaming algorithm for PQ 

Remember that in this section our stream is given without any timestamps. Therefore we consider 
in this section only streams w of ins (a), ext(a), where a £ [0, U\. For the sake of clarity, we assume 
for now that the stream has no duplicate. Our algorithms can be extended to the general case, but 
the technical difficulties shadow the main ideas. 

Up to padding we can assume that JV is a power of 2: we append a sequence of 
ins(a)ext(a)ins(a + l)ext(o + 1) . . . of suitable length, where a is large enough so that there 
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is no duplicate (assuming that w is of even size, otherwise w PQ(J7)). We use O(logiV) bits of 
memory to store, after the first pass, the number of letters padded. 

We use a hash function based on the one used by the Karp- Rabin algorithm for pattern match- 
ing. For all this section, let pbea prime number in {max(2f7 + 1, iV c+1 ), . . . , 2 max(2C/ + 1, A c+1 )}, 
for some fixed constant c > 1. Since our hash function is linear we only define it for single inser- 
tion/extraction as 

hash(ins(a)) = a a mod p, and hash(ext(a)) = —a a mod p, 

where a is a randomly chosen integer in [Q,p — 1], This is the unique source of randomness of 
our algorithm. A hashcode h encodes a sequence w if h = hash(w) as a formal polynomial in a. 
In that case we say that h includes w[i], for all i. Moreover w is balanced if the same integers 
have been inserted and extracted. In that case it must be that h = 0. We also say that h is 
balanced it it encodes a balanced sequence w. The converse is also true with high probability by 
the Schwartz-Zippel lemma. 

Fact 13. Let w be some unbalanced sequence. Then Pr(hash(u>) = 0) < — < j^. 

The forward-pass algorithm was introduced in [7j, but the reverse-pass one is even simpler. As a 
warming up, we start by introducing the later algorithm. In order to keep it simple to understand, 
we do not optimize it fully. Last define the instruction Update (/i, v) that returns (h + hash(u) 
mod p) and updates h to that value. 

4.1 One-reverse-pass algorithm for PQ 

Our algorithm decomposes the stream w into blocks. We call a valley an extraction w[t] = ext(a) 
with w[t + 1] = ins (b). A new block starts at every valley. To the i-th block we associate a 
hashcode hi and an integer mj. Hashcode hi encodes all the extractions within the block and the 
matching insertions. Integer mj is the minimum of extractions in the block. With the values (rrii)i, 
one can encode insertions in the correct hi if w S PQ. Observe that we use index notations for 
block indices and bracket notations for stream positions. 

Algorithm [T] uses memory space O(r), where r is the number of valleys in w. We could make it 
run with memory space 0(y/N log N) by reducing the number of valleys as in [7]. We do not need 
to as we use another compression in the two-pass algorithm. 

We first state a crucial property of Algorithm [H and then show that it satisfies Theorem \15\ 
when there is no duplicate. We remind that we process the stream from right to left. 

Lemma 14. Consider Algorithm [1\ right after processing ins(a). Assume that ext(a) has been 
already processed. Let hk,hk' be the respective hashcodes including ext(a), ins(a). Then k = k' if 
and only if all ext(b) occurring between ext(a) and ins(a) satisfy b > a. 

Theorem 15. There is a 1 -reverse-pass randomized streaming algorithm for PQ(f7) with memory 
space 0(r(log N + log U)) and one-sided bounded error N~ c , for inputs of length N with r valleys, 
and any constant c > 0. 

Proof. We show that Algorithm [T] suits the conditions, assuming there is no duplicate. Let w € 
PQ(?7). Then w always passes the test at linellOl Moreover, by Lemma [T^l each insertion ins(a) 
is necessarily in the same hashcode than its matching extraction ext(a). Therefore, all hashcodes 



9 



Algorithm 1: One-reverse-pass algorithm for PQ 

1 mo < oo ; /io*S— 0; t <— N] i <— // i is called the block index 

2 While t > 



3 If w[t] = ins(a) 

4 k <— max{j < % : rrij < a} ; //Compute the hashcode index of a 

5 Update(/ifc, 

6 Else u>[i] = ext(a) 

7 If w[t + 1] = ins(&) //This is a valley. We start a new block 
s i 4— i + 1 ; to; a; /i£ //Create a new hashcode 

9 Else w[t+l] = ext(6) 

io Check(a>&) //Check that extractions are well-ordered 

n Update(/ij, ui[t]) 

12 t t — 1 



13 For j ' = to i: Check (/ij = 0) //Check that hashcodes are balanced w.h.p. 

14 Accept // w succeeded to all checks 



equal at line [13] since they are balanced. In conclusion, the algorithm accepts w with probability 
1. 

Assume now that to ^ PQ. First we show that unbalanced w are rejected with high probability, 
that is at least 1 — N~ c , at line PT3l if they are not rejected before. Indeed, since each w[t] is 
encoded in some hj, at least one hj must be unbalanced. Then by Fact 1131 the algorithm rejects 
w.h.p. We end the proof assuming w balanced. We remind that we process the stream from right 
to left. The two remaining possible errors are: (1) ins(a) is processed before ext(a), for some 
a; and (2) ext(o), ext(b), ins (a) are processed in this order with b < a and possibly intermediate 
insertions/extractions. In both cases, we show that some hashcodes are unbalanced at line 1131 an d 
therefore fail the test w.h.p by Fact PT3l except if the algorithm rejects before. 

Consider case (1). Since ins(a) is processed before ext(a), there is at least one valley between 
ins (a) and ext(a). Therefore ins (a) and ext(a) are encoded into two different hashcodes, that 
are unbalanced at line [T3j 

Consider now case (2). Lemma [141 gives that ext(a) and ins(a) are encoded in two different 
hashcodes, that are again unbalanced at line [T3j □ 

4.2 Bidirectional two-pass algorithm 

Algorithm [2] performs one pass in each direction using Algorithm [3j We use the hierarchical 
data structure of [15] in order to reduce the number of blocks. A block of size 2* is of the form 
[(q — 1)2* + l,q2 l ], for 1 < q < N/2 1 . Observe that, given two such blocks, either they are disjoint 
or one is included in the other. We decompose dynamically the letters of w, that have been already 
processed, into nested blocks of 2* letters as follows. Each new processed letter of w defines a new 
block. When two blocks have same size, they merge. All processed blocks are pushed on a stack. 
Therefore, only the two topmost blocks of the stack may potentially merge. Because the size of 
each block is a power of 2 and at most two blocks have the same size (before merging), there are 
at most log iV + 1 blocks at any time. 

Moreover, since our stream size is a power of 2, all blocks eventually appear in the hierarchical 
decomposition, whether we read the stream from left to right or from right to left. In fact, if two 
same-sized blocks appear simultaneously in one decomposition before merging, the same is true in 
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Algorithm 2: Bidirectional 2-pass algorithm for PQ 

1 OnePassAlgor ithm (w) reading stream from left to right 

2 OnePassAlgor ithm (w) reading stream from right to left 

3 Accept // w succeeded to all checks 

Algorithm 3: OnePass Algorithm 

1 S 4- [] ; 

2 If left -to -right -pass Then Push (S , (0, — oo, 0)) // Initialization of S 

3 While stream is not empty 

4 Read(next letter v on stream) // See below 

5 While the 2 topmost elements of S have same block size £ 

e (hi,mi,£) <*-Pop(,S) ; (/i 2 ,m 2 ,i) «-Pop (#) 

7 Push (S , (hi + h% mod p, min(mi, 7712), 2£) ) // Merge of 2 blocks 

s If left-to-right-pass Then Check (S = [(0, -oo, 0), (0, 0, N)\) 

9 Else CheckCS 1 = [(0,0,iV)])} 

10 Return 

n 

12 Function Read(v): 

13 Case v — ins(a) // When reading an insertion 

14 Let (h,m,£) be the first item of S from top such that a>m 

15 Replace (h,m,£) by (Update(/i, v), m, £) 

16 Push (S,(0,+oo,l)) 

17 Case v = ext(a) and left -to -right -pass // When reading an extraction 
is For all items (h,m,£) on S such that m>a: Check(/i = 0) 

19 Let (h,m,£) be the first item of S from top such that a>m 

20 Replace (h,m,£) by (Update(/i, v), m, £) 

21 Push (5,(0,0,1)) 

22 Case v = ext(a) and right -to-left -pass // When reading an extraction 

23 For all items (h,m,£) on S such that m>a: Check(/i = 0) 

24 Push (S , (hash(ti), a, 1) ) 



the other decomposition. This point is crucial for our analysis. 

Algorithm [3] uses the following description of a block B: its hashcode hs , the minimum ms of 
its extractions, and its size £b- For the analysis, we also note ts the index such that w[ts] = tub- 
Among those parameters, only hs can change without B being merged with another block. On the 
pass from right to left, all extractions from the block and the matching insertions are included in 
hs- On the pass from left to right, insertions are included in the hashcode of the earliest possible 
block where they could have been, and the extractions are included with their matching insertions. 
The minimums are used to decide where to include values (except extractions on the pass 

from right to left). Observe that it is important to check that \ib = whenever possible and not 
at the end of the execution of the algorithm, since only one block is left at the end. 

When there is some ambiguity, we denote by and the hashcodes for the left-to-right and 
right-to- left passes. Observe that ?ns, t_B, £b are identical in both directions. 

Proof of Theorem [^} We show that Algorithm[2]suits the conditions, assuming there is no duplicate. 
The space constraints are satisfied because each element of S takes space 0(log N + logU) and S 
has size at most log N + 1. The processing time is from inspection. 
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Figure 2: Relative positions of insertions and extractions used in the proof of Theorem H] 

As with Theorem 1151 inputs in PQ(i7) are accepted with probability 1, and unbalanced inputs 
are rejected with high probability (at least 1— N~°). Let w G" PQ be balanced. For ease of notations, 
let w[— 1] = ins(— oo) and w[Q] = ext(— oo). Then, there are r < p such that w[t] = ext(fo), 
w[p] = ext(a), a > b, and w[t] ^ ins(a) for all r < t < p. 

Among those pairs (r, p), consider the ones with the smallest p. From those, select the one with 
the smallest b, with w[r] = ext (b). Let B, C be the largest possible disjoint blocks such that r is 
in B and p in C. Then B and C have same size, are contiguous, and appear simultaneously in each 
direction before they merge. Let p' and r' be such that w[p'} = ins (a) and w[t'] = ins (6). The 
minimality of p and the minimality of b guarantee that w[t] is an insertion for all r < t < p. Indeed 
if w[t] = ext(c) either b > c, which contradicts the minimality of b, or c > b and (r, t) contradicts 
the minimality of p. In particular, tc > p and ts < t. Similarly r < r', otherwise r would be a 
better candidate than /). 

We distinguish three cases based on the position p' of ins (a) (see Figure [2]): p' [tB,tc], 
ts < p' < t, and p < p' < tc- These cases determine in which hashcode ins(a) is included. We 
analyze Algorithm [3] when some letter is processed before blocks potentially merge. 

Case 1: p' [ts,tc]- O ne can prove that h~g is unbalanced when w[tc] is processed and that 

is unbalanced when w\ts\ is processed; therefore Algorithm [3] detects w.h.p. h~g ^ or ^ 
depending on whether mg > mc (see Lemma [191 in Appendix |B|) . 

Case 2: ts < p' < t. We show that when Algorithm [3] processes w\tB[ = ext(mg), it checks 
hp = at line [23] for some hp including ins(a) but not ext(a). Thus it rejects w.h.p. 

When w[p'\ = ins(a) is processed on the right-to-left pass, r G B\ with B\ a block in the stack. 
t £ B, therefore B\ intersects B. Because B\ % B, we have B\ C B. Because w[t] = ext(6), we 
have a > b> mBi, an d block B\ is eligible at line [14] of Algorithm [3] meaning that w[p'} = ins(a) 
is included in either hg or a more recent hashcode h^ 2 . Since p' G B, again Bi'-LB. Last, when 
Algorithm [3] processes w\Pb\ = ext(ms), since we are still within B, some hashcode hs 3 , with 
B3 Q B, includes to [p 1 ]. Moreover, does not include w[p] = ext(o) since p £ C and C comes 
before B. Last, ms 3 > tub, by definition of mg. Hence, Algorithm [3] checks = at line [23] 
when processing lofifl]. B3 satisfies the conditions for D when w[tB] is processed, and Algorithm[3] 
rejects w.h.p. 

Case 3: p < p' < tc- The proof is the same as case 2, replacing r, B, B\, B2, B3, h^, h^ 2 , 
h^ 3 , tB, C with p, C, Ci, C2, C3, /i^f, /i^ 2 , /i^f, tc, B and line [23] with line [18] Note that we only 
have a > mc 1 this time, so it is important that the inequality at line [H] is large and not strict. □ 

4.3 Generalization when duplicates occur 

We maintain two additional parameters 5b and Cb for each block B. The difference between 
the number of insertions and extractions included in hs is stored in 5b- Whenever 5b = 0, we 
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check Hb = 0. The number of unmatched occurrences of ins(ms) for the left-to-right pass (resp. 
ext(mg) for the right-to-left pass) is stored in Cb- We can then appropriately determine whether 
each ext(me) (resp. ins(me)) should be included in Kb- 

The change on the criterion of line [14] of Algorithm [3] makes the proof of case 3 of the theorem 
longer and breaks the symmetry. 
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A Missing proofs for the lower bound 

We start by proving the lemma relating the streaming complexity of deciding if an in- 
stance of RAlNDROPS(m, n) belongs to PQ-TS(3mn) to the communication complexity of 
WEAKlNDEX(n, m). 

Proof of Lemma Assume that there exists a p-pass randomized streaming algorithm with mem- 
ory space s{m, n), that decides if an instance of RAiNDROPS(m, n) belongs or not to PQ-TS(3nm). 
Each instance of Raindrops(to, n) can be encoded by an input of WEAKlNDEx(n, m), where each 
of the 3m players has one part of it. Then, the rest of the proof consists in showing how the players 
can use the algorithm in order to construct a protocol that satisfies the required properties of the 
lemma. 

Each player simulates alternatively the algorithm. A player performs the simulation until the 
algorithm reaches the part of the input of the next player. Then the player sends the current state 
of the algorithm, so that the next player can continue the simulation. Since the algorithm uses 
at most memory space s(m,n), the current state can be encoded using s(m,n) bits. Each pass 
corresponds to one round of communication, implying the result. □ 

Before giving the next missing proofs of Section (3j we state some useful properties of entropy 
and mutual information that we need. See [10] for more information. 

Fact 16. Let X, Y, Z, R be random variables such X and Z are independent when conditioning on 
R, namely when conditioning on R = r, for each possible values of r. Then I(X : Y\Z,R) > I(X : 
Y\R). 



14 



Proof. From the definition of mutual information and the independence of X, Z when conditioning 
on R, we get that 



l(X : Y\Z, R) = H_(X\Z, R) - B.{X\Y, Z, R) = R(X\R) - R(X\Y, Z, R). 



Using that entropy can only decrease under conditioning, and using again the definition of mutual 
information, we conclude by bounding the last term as 



Proposition 17 (Chain rule). Let X,Y,Z,R be random variables. Then I(X,Y : Z\R) = I(X : 



Proposition 18 (Data processing inequality). Let X,Y,Z,R be random variables such that R is 
independent from X, Y, Z . Then I(X : Y\Z) > I(f(X, R) : Y\Z), for every function f. 

Note that the previous property is usually stated with no variable T. Nonetheless, since T is 
independent from the other variables, we have I(X : Y\Z) = I(X,R : Y\Z), and then we can apply 
the usual data processing inequality. 

We can now prove our three lemmas. 

End of proof of LemmaUH Let 11,11' be the respective transcripts of P,P'. For convenience, note 
n<7 m+ i = n# m , IIb = IIc m and n<7 m+1 = II^. Remind that the public coins of a protocol are 
included in its transcript. 

First, each player of P' sends 3 messages by round, and there are {p + 1) rounds. Since each 
message has size at most s(m,n), we derive that the length of II' is at most 3(p + l)s(m, n). 

Then, in order to prove that there is only a small amount of information in the transcripts of 
Bob and Charlie, we show a direct sum of some appropriated notion of information cost. Consider 
first the transcript of Player C\. Because of the restriction on the size of his messages, we know 
that | lid | — (P + l)s(m,n). From this we derive a first inequality on the amount of information 
this transcript can carry, using that the entropy of a variable is at most its bit-size: 



We now use the chain rule in order to get a bound about the information carried by P' on a single 
instance. 



E(X\R) - E(X\Y, Z, R) > R(X\R) - R(X\Y, R) = 1{X : Y\R). 



□ 



Z\R) + l(Y : Z\X, R). 



I(K,D:IId|X) < |IId| < {p+l)s{m,n). 



in 



I(K,D :IId|X) 



^Jl((l£j, Di)j>i : IId|X, (Ki,Di) i<:j ) (by chain rule) 



in 




> 



KKj , Dj : ILb.^ |X) (by Fact [ED 



j'=i 

m x l(Kj,Dj : IIb j _ 1 |X, J) (by conditioning on J) 

m x l(Kj,Dj : IIbj.^J, (Xi) if Lj\Xj) (independence of J, (X^j) 

m x I(K,D : II^|X) (since J, (Xj)j^j are public coins of P ). 
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We then do similarly for Player B m and therefore conclude the proof. First the size bound on 
messages of B m gives I(n^ m : D|X, K) < (p + l)s(m, n). Then as before we get: 

m m 

I(D : n Bm |X,K) = J2l(Dj : n B jX,K, (A)i>i) > : n ^+il X ' K ' (A)i>i) 

j=i j=l 
> m x : n Cj+1 , J, (XJ^jIXj,^) = m x I(D : W B \X, K). 

□ 

Proof of Lemma \1(A From the second hypothesis and the data processing inequality we 
get that I(K,D : TLa,c\X) < a, which after applying the average encoding leads to 
^x,k,dh 2 (HA,c{x, k,d),TlA,c(x, K, D)) < na. We now restrict uo by conditioning on D = 1. 
Then (X,K) is uniformly distributed. Moreover, since D = 1 with probability 3/4 on fio, we 
get E^h^rU c(x,k, 1),Ra,c(x,K,1)) < nua. Let J, L be uniform integer random variables re- 
spectively in [2,^] and + l,n]. Then the above implies K x j h 2 (IL4,c(x, j, 1), Ha,c( x , K, 1)) < 
|«a and E^z h^II^c^, I, 1), Ua,c(x, K, 1)) < |«a. Applying the triangle inequality and that 
(u + v) 2 < 2(u 2 + u 2 ), we get 

E h 2 (U A ,c{x,j,l),IlA,c(x,l,l)) < fna. 

Using the convexity of h 2 , we finally obtain for b = 0, 1: 

E h 2 (ri4. c (x[l,Z - l]bX[l + l,n],j, l),U A ,c(x[l,l- l]bX[l + l,n],l, 1)) < f na. 

Now the chain rule allow us to measure the information about a single bit in Ha,c as 
l(X[L]:U A ,c(X,JA)\X[l,L-l}) = E I(X[l] :U Ai c{X,J,l)\X[l,l-l]) 

= - x I(X[f + 1, n] : n A)C (X, J, 1)\X[1, §]). 
n 

Since the entropy of a variable is at most its bit-size, we get that the last term is upper bounded 
by lll^d, which is at most an by the first hypothesis. Then as before, the average encoding and 
the triangle inequality lead to 

E h 2 {U A ,c(x[l,l - l]0X[l + l,n], j,l),U A ,c(x[l,l - 1]IX[1 + l,n], j, 1)) < 16 K a. 
Combining gives 

E h 2 (U A ,c{x[l,l - l]0X[l + l,n],l, 1), IL A ,c(x[l, I - 1]IX[1 + l,n],Z,l)) < 28a. 

x[L,l— ' 

Let Rb be the random coins of B. Since they are independent from all variables, including the 
messages, the previous inequality is still true when we concatenate Rb to Ha,c- Then IL3 is 
uniquely determined from Rb once K, D, X[l, K — 1] are fixed, which is the case in that inequality. 
Therefore replacing Rb by Ub can only decrease the distance, concluding the proof. □ 
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Proof of Lemma Using the data processing inequality and the hypothesis we get that I(D : 
IL\X,K)) < a. Therefore by average encoding, E^^h^n^, k, d), H(x, k, D)) < na. 

Let L be a uniform integer random variable in + l,n]. Then E X; i t d h 2 (n(x, I, d), H(x, I, D)) < 
2na. Using the convexity of h 2 and the fact that X[l] is a uniform random bit, we derive 

E h 2 (n(x[l,/ - l]QX[l + l,n],l,d),U(x[l,l - l]0X[l + l,n],l,D)) < Ana. 
x[l,l—l],l,d 

Since D = with probability 1/2 when X[l] =0 and K = I, we finally get the two inequalities 
E h 2 (n(x[l,Z - l]0X[l + l,n],l,0),IL(x[l,l - l]0X[l + l,n],l,D)) < 8ko, 

x[L,l— 

E h 2 (n(x[l, I - l]0X[l + 1, n],l, 1), n(x[l, I - l]0X[l + 1, n],l, D)) < 8na, 

x[l,J— 1],J 

leading to the conclusion using the triangle inequality and that (u + v ) 2 < 2(u 2 + v 2 ). □ 



B Missing proofs for the algorithm 

We start by proving the property of Algorithm [1] we use in the proof of Theorem [T5l 



Proof of Lemma 14 ■ Remind again, that we process the stream from right to left in this proof, and 
that hk,hf,i are the respective hashcodes including ext(a), ins(a). First assume that all ext(6) 
between ext(a) and ins (a) satisfy b > a. Let i be the current block index while processing ins (a). 
Observe that k is the current block index right after processing ext(a). Since ext(a) is processed 
before ins(a) and since there is a valley between ext(a) and ins(a), we have k < i. 

We prove that k' = max{j < i\mj < a} = k. The first equality is from line [4] of Algorithm [TJ 
We now prove the second equality. For each j G {k + 1, . . . , i}, value rrij is extracted between ext(a) 
and ins(a). Then, our assumption leads to rrij > a. Moreover, because the algorithm checks at 
line 1101 that extraction sequences included in the same hashcode are decreasing, we have < a, 
leading to the second equality. 

We now prove the converse by contrapositive. Assume that some ext(6) between ext(a) and 
ins(a) satisfies b < a. Since we forbid duplicates, in fact b < a. Let j be the current block index 
right after processing ext(6). Then line [TU] ensures that rrij < b. Again, k is the current block index 
right after processing ext(a), and therefore k < j. If k = j, then the extraction sequence is not 
decreasing and line [10] rejects, contradicting the hypotheses that the algorithm has not rejected yet 
after processing ins(a). Therefore k < j. But, line U] and the fact that rrij < b imply that k' > j, 
and therefore k < k' . □ 

We now give the missing part of the proof of Theorem [H 

Lemma 19. If p' [tB,tc]> then Algorithm^ rejects w with probability at least 1 — N~ c . 

Proof. We prove that is unbalanced when w[tc] is processed and that Hq is unbalanced when 
w[ts] is processed. From that, we deduce that the algorithm rejects with high probability unless 
mB < me an d mc < ms, he. rriB = Trie, which is impossible because w has no duplicates and B 
and C are disjoint. 
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Indeed if mc < tub then Algorithm [3] checks that h~j^ = at line [18] when processing w[tc]j 
and rejects with high probability because is unbalanced. Similarly, if mc < tub, it rejects with 
high probability at line [23] when processing w[tB] on the right-to-left pass. 

Now we only have to prove that kg (resp. h^) is unbalanced when w[tc] (resp. w[ts]) is 
processed. Let us assume there exists B\ C B such that ins(o) is included in h~g when to [is] 
is processed. Then, by definition of m#, ms 1 > tub- Moreover, p G C, so w[p] = ext(a) is not 
processed yet and not included in B\. Therefore, Algorithm [3] checks h~g = at line 1181 and rejects 
w.h.p. We can now assume that there is no such B\ C B, and therefore that hs, does not include 
ins (a) when w[tc] is processed. Since hg includes ext(a), h~g is unbalanced when tc is processed. 

The proof for h@ is the same as above, replacing h~^, hg ' , B, B\, ts and tc with h^, h,Q ' , C, 
Ci, tc and ts, and line [T8l with line [23] □ 
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